Finding, Assessing, and Integrating Statistical Sources for Data Mining
نویسندگان
چکیده
As the knowledge discovery process has been widely applied in a variety of domains, there is a growing opportunity to use the Linked Open Data (LOD) cloud as a primary data source for knowledge discovery. The tasks of finding the relevant data from various sources and then using that data for the desired analysis are the key challenges. There is a striking increase on the availability of statistical data and indicators (e.g. social, economic) in the LOD, and the Cube ontology has become the de facto standard for their description according to a multi-dimensional model. In this paper we discuss a detailed scenario for using the LOD as a primary source of data for building analysis models in the Peacebuilding domain. Next, we present an approach to finding potentially relevant cube datasets in the LOD cloud, assessing their compatibility, and then integrating the compatible datasets to enable the application of data
منابع مشابه
LiDDM: A Data Mining System for Linked Data
In today’s scenario, the quantity of linked data is growing rapidly. The data includes ontologies, governmental data, statistics and so on. With more and more sources publishing the data, the amount of linked data is becoming enormous. The task of obtaining the data from various sources, integrating and fine-tuning the data for desired statistical analysis assumes prominence. So there is need o...
متن کاملIntegrating AHP and data mining for effective retailer segmentation based on retailer lifetime value
Data mining techniques have been used widely in the area of customer relationship management (CRM). In this study, we have applied data mining techniques to address a problem in business-to-business (B2B) setting. In a manufacturer-retailer-consumer chain, a manufacturer should improve its relationship with retailers to continue its business. Segmentation is a useful tool for identifying groups...
متن کاملRatio Rule Mining from Multiple Data Sources
Both multiple source data mining and streaming data mining problems have attracted much attention in the past decade. In contrast to traditional association-rule mining, to capture the quantitative association knowledge, a new paradigm called Ratio Rule (RR) was proposed recently. We extend this framework to mining ratio rules from multiple source data streams which is a novel and challenging p...
متن کاملData Mining in the Presence of Quantitatively and Qualitatively Diverse Information
The work under this grant has established several concepts for working with diverse data. As a first step, abstractions have been developed that appropriately represent the richness of data types such as sequence and graph data, and their combination with conventional data types such as Boolean (or item) data. As a second step towards integrating diverse data, techniques have been developed for...
متن کاملHeavy metal pollution and identification of their sources in soil over Sangan iron-mining region, NE Iran
The aim of this study was to determine the extent of metal pollutions and the identification of their major sources in the vicinity of the Sangan iron mine occurring in NE Iran. Soil samples were collected from the vicinity of the mine site and analyzed for heavy metals. In addition, the chemical speciation of these metals was investigated by means of the sequential extraction procedure. The st...
متن کامل